The following data is on New Orleans tornado building damage during December 2022. This data was obtained from Verisk Analytics and it was derived computer vision and machine learning using post-catastrophe aerial imagry data. There are approximately 42,000 buildings in this dataset.
Here are some before and after photos of three buildings
This is totally a building bro; 100
This is totally another building bro 100
This is totally another another building bro catastrophescore fo 60
I converted roof_solar into a T/F statement, by converting “SOLAR PANEL” to TRUE and “NO SOLAR PANEL” to FALSE. In addition to this, I converted the roof shapes that the computer wasn’t very sure about (up to a 20% chance of being incorrect) into NA. There were some cells in damage_level where they were filled with an empty character, so I converted that into NA as well. I then separated longitude and latitude so that it could be easily read into leaflet.
df <- read.csv("clean_data.csv") %>%
janitor::clean_names() %>%
mutate(roofsolar = case_when(roofsolar == "SOLAR PANEL" ~ TRUE)) %>%
mutate(roofshape = ifelse(roofshascr < 0.80, NA, roofshape)) %>%
select(-c(roofshascr, roofcondit_discolordetect, roofcondit_discolorscore, roofcondit_discolorpercen, trampscr, roofcondit_tarppercen))
df$rooftopgeo <- gsub("POINT \\(|\\)", "", df$rooftopgeo)
df <- df %>%
separate(rooftopgeo, into = c("long", "lat"), sep = " ", convert = TRUE)
df$damage_level <- ifelse(df$damage_level == "", NA, df$damage_level)
df$roofshape <- factor(df$roofshape, levels = c("gable", "hip", "flat"))
levels_roofmateri <- c("metal", "shingle", "membrane", "shake", "tile")
df$roofmateri <- factor(df$roofmateri, levels = c("gravel", levels_roofmateri))
df$roofmateri <- factor(df$roofmateri, levels = levels_roofmateri)
Catastrophe scores are separated by the summary of the dataset, excluding the catastrophe scores of 0.
mostdamage <- df %>% filter(catastrophescore >= 50)
nodamage <- df %>% filter(catastrophescore == 0)
decimated <-df %>% filter(catastrophescore == 100)
middamage <- df %>% filter(catastrophescore < 50 & catastrophescore >= 15)
leastdamage <- df %>% filter(catastrophescore < 15 & catastrophescore >= 2)
minimaldamage <- df %>% filter(catastrophescore == 1)
Function for adding labels to the maps:
create_popup <- function(data) {
paste("<b>Location</b><br>",
" Longitude: ", data$long, "<br>",
" Latitude: ", data$lat, "<br>",
"<b>Catastrophe Score</b><br>",
" Score: ", data$catastrophescore, "<br>",
"<b>Roof Shape</b><br>",
" Shape: ", str_to_title(data$roofshape), "<br>",
"<b>Roof Material</b><br>",
" Material: ", str_to_title(data$roofmateri), "<br>")
}
r_med_sq_err <- function(model, absolute = FALSE){ # adding in the option for an absolute error
if(sum(class(model) %in% c("glm","lm")) > 0){
if(absolute == TRUE){
median(abs(residuals(model)))
}
sqrt(median(residuals(model)^2))
} else {
if(class(model) == "list"){
stop("Did you provide a list of models? Use map() instead.")
}
stop("'model' must be either a glm or lm object.")
}
}
This shows all of the damage points, the vast majority of roofs have no damage.
These are the buildings that sustained damage
Red indicates the buildings that were the most damaged (catastrophe score >= 50), orange indicates (25 < catastrophe score < 50), blue indicates (catastrophe score <= 25, excluding scores of 0). The majority of the buildings (3852) exhibited a catastrophe score of 0.
Map of the buildings that experienced damage:
Map of the buildings that experienced the most damage:
Map of the buildings that experienced mid damage:
Map of the buildings that experienced the least damage:
Map of the buildings that experienced no damage:
Map of the buildings that experienced no damage and the most damage:
Since most of the buildings in this dataset were not damaged by a tornado, the summary of the catastrophe scores of each building is skewed. This can be seen below:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 2.217 0.000 100.000
## # Comparison of Model Performance Indices
##
## Name | Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 | RMSE | Sigma
## --------------------------------------------------------------------------------------------
## mod1 | glm | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 0.007 | 10.780 | 10.781
## mod2 | glm | 3.2e+05 (<.001) | 3.2e+05 (<.001) | 3.2e+05 (<.001) | 0.019 | 11.454 | 11.455
## mod3 | glm | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 0.007 | 10.780 | 10.781
## mod4 | glm | 10240.8 (>.999) | 10240.9 (>.999) | 10277.4 (>.999) | 0.020 | 10.110 | 10.132
## mod5 | glm | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 0.006 | 10.764 | 10.765
## mod6 | glm | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 0.007 | 10.780 | 10.781
## [1] 11.55021
## [1] 10.76414
Due to this, I made models that excluded the catastrophe scores of 0 to just look into the structures that experienced damage. Below is the summary for the structures that exhibited damage:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 15.00 28.64 46.00 100.00
Models
## # Comparison of Model Performance Indices
##
## Name | Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 | RMSE | Sigma
## ---------------------------------------------------------------------------------------------
## mods1 | glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods2 | glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods3 | glm | 26094.1 (>.999) | 26094.1 (>.999) | 26147.3 (>.999) | 0.147 | 27.816 | 27.857
## mods4 | glm | 30925.4 (<.001) | 30925.5 (<.001) | 30968.0 (<.001) | 0.156 | 28.795 | 28.821
## mods5 | glm | 30773.9 (<.001) | 30773.9 (<.001) | 30828.6 (<.001) | 0.176 | 28.402 | 28.437
Out of the models I made, Model 5 appeared to work best. Though it should be noted that none of these models fit particularly well based on the variables used.
Model 5
##
## Call:
## glm(formula = catastrophescore ~ long + roofmateri + rooftree +
## enclosure, family = gaussian(link = "identity"), data = extra)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -75.62 -18.17 -8.69 13.16 82.23
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4460.13497 1149.62714 3.880 0.000107 ***
## long 49.03688 12.76374 3.842 0.000124 ***
## roofmaterishingle -23.29005 1.43088 -16.277 < 2e-16 ***
## roofmaterimembrane 12.42319 2.18262 5.692 1.37e-08 ***
## roofmaterishake -21.90493 4.73851 -4.623 3.94e-06 ***
## roofmateritile -22.84290 7.71403 -2.961 0.003087 **
## rooftree 0.56774 0.06876 8.257 < 2e-16 ***
## enclosureTRUE 44.80193 10.78228 4.155 3.34e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 808.6766)
##
## Null deviance: 3157376 on 3226 degrees of freedom
## Residual deviance: 2603130 on 3219 degrees of freedom
## (10 observations deleted due to missingness)
## AIC: 30774
##
## Number of Fisher Scoring iterations: 2
## GVIF Df GVIF^(1/(2*Df))
## long 1.010340 1 1.005157
## roofmateri 1.020779 4 1.002574
## rooftree 1.012753 1 1.006356
## enclosure 1.004155 1 1.002076
Root mean squared error for Model 4
## [1] 28.59962
Based on Model 5, I have made model predictions:
I then plotted the predicted catastrophe scores alongside the actual catastrophe scores for reference.
The variables included in this dataset were shown to not be entirely helpful in predicting catastrophe scores accuarately, which is exemplified in the graph above.